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Test-Free Person Measurement with the Rasch Simple Logistic Model 
Howard E. A. Tinsley and Rene' V. Dawis 
University of Minnesota 

Rasch (1960) has proposed a simple logistic model for tests of intel- 
ligence or attainment which hypothesizes that the probability of a correct 
response to an item is a function of the ability of the person and the 
difficulty of the item. Rasch has been able to demonstrate mathematically 
that hi? model allows the separation and the independent estimation of 
these two parameters. Thus, in theory, given a set of calibrated items which 
fit his model, one may calculate ability estimates on the same scale from 
responses to any subset of items. This means that alternative or partial 
forms of a test may be scored on a common scale. Comparable scores presum- 
ably can be obtained even when the same items were not administered to all 
subjects, thereby making possible the individualized administration of tests 
in which only those items relevant to the examinee's ability level are admin-., 
istered. In short, the Rasch simple logistic model makes possible what 
Wright (1968) has characterized as test-free person measurement. If these 
claims are substantiated, tests developed in accordance with the Rasch model 
would represent a marked improvement over tests developed in accordance with 
classical psychometric theory. 

Although introduced in 1960, this aspect of the Rasch simple logistic 
model has been virtually ignored. Several investigators have studied the 
use of the model for item calibration (Anderson, Kearney, & Everett, 1968; 
Brooks, 1965; Rasch, 1960; Tinsley & Dawis, 1972a, 1972b; and Wright, 1968) 
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but the work of Wright (1968) represents the only investigation the present 
authors were able to find which attempts to determine whether the model leads 
to test-free person measurement. Wright's research is based upon the respon- 
ses of 976 beginning law students to 48 reading comprehension items on the 
Law School Admission Test. Wright divided the original 48-item test into two 
sub-tests, one containing the 24 easiest items, the other containing the 24 
hardest items. For each subject, Wright calculated his raw score and his 
Rasch ability estimate on the two tests. He then calculated the difference 
between the two raw scores and the difference between the two ability esti- 
mates, and compared the distribution of the differences for the two types 
of scores. Wright points out that the distribution of . differences for raw 
scores, with a mean of 6.78 and a standard deviation of 3.30 is almost en- 
tirely above -.ero (see Table 1). On the other hand, the distribution of 
differences in Rasch ability estimates, with a mean of .061 and a standard 
deviation of .749, is centered around zero. Wright (1968) concludes that 
the alternative Rasch ability estimates seem to be in agreement. 

Insert Table l about here. 

Wright goes a step further with the Rasch ability estimates. For each 
individual, he divides the difference between the two ability estimates by 
th<i measurement error of this difference. This produces what Wright calls 
the distribution of standardised differences with a mean of .003 and a stand- 
ard deviation of 1.014. Wright concludes from these data that the only 
variation observed in ability estimates is of the same magnitude as that a:: 
pec ted from the standard error of measurement in the test, and that these data 
support the claita that the Rasch simple logistic model allows the measurement 
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of a person with any set of calibrated items. 

Two problems with this investigation must be noted. First, the results 
were biased in favor of the Rasch model when Wright chose to summarize the dif 
ference between scores on the two tests in terms of the mean.' Because the raw 
scores are all positive, differences in raw scores will ail be positive. The 
Rasch ability estimates are logarithms, however, half of which are negative. 
Approximately half the differences in logarithmic ability estimates will be 
negative, with the result that the mean difference in logarithmic ability will 
be close to zero. Use of the absolute value of the differences would have 
avoided this problem. The results were further biased in favor of the Rasch 
model when Wright utilized the standardized difference in the logarithmic 
ability estimates without doing so for the difference in raw scores. Com- 
putation o£ the mean standardized absolute difference for both types of scores 
would have been preferable. 

The assertion, then, that the Rasch simplA logistic model allows test- 
free person measurement remains largely unsubstantiated. Clearly, this 
question deserves considerable attention. The purpose of this research was 
to investigate this claim. 

Method 

Instruments . Four analogy tests, combined into two test booklets, were 
utilized in this study. The first test booklet contained a 60-item word 
analogy test followed by a 40-item symbol analogy test. The second test 
booklet contained a 60-word number analogy test followed by a 50-item pic- 
ture analogy test. All items were of the multiple choice type with five 
response alternatives and with the blank in the item stems distributed among 
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the four positions. All tests were introduced by one standard page of test 
instructions. 

Subjects . Two samples of subjects were employed in this study. College 
students enrolled in an introductory psychology class at the University of 
Minnesota during the Fall of 1970 constituted the first sample. All were vol- 
unteers (obtained through the subject pool of the Department of Psychology) 
who were participating in the research to gain additional points toward their 
course grade. Some students completed only one of the test booklets while 
others completed both of them. High school students enrolled in two suburban 
Twin Cities high schools constituted the second sample. Each student com- 
pleted one test booklet. In both high schools, the test booklets were com- 
pleted by students in the classes of those teachers who volunteered to par- 
ticipate in the study. 

Because the test forms were designed to be self-explanatory, subjects 
were simply given the test, instructed to read the directions, and to com- 
plete the test. The test administrator was always available, however, to 
answer any questions. No time limits for completion of the test were set 
but students in the high schools were allowed only one fifty-minute class 
period in which to complete the test. 

A nalysis . The procedure for such an investigation need, not be com- 
plicated. First a sample of subjects must be administered two tests of the 
same ability, composed of items which have been calibrated on a common scale. 
Then, scores on these two tests must be converted to ability estimates on a 
common scale. These ability estimates should be approximately the same, with 
errors of measurement accounting for all the differences. Four such com- 
parisons were made in this study, one each with word, picture, number and 
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symbol analogies. In each case, the sample of high school students and col- 
lege students was combined. Next , each test was divided into two subtests. 

The subdivision of the word picture, and symbol analogy tests was straight- 
forward. First, the items in the total test were arranged in the order of 
their easiness. Then they were divided into two subtests with one subtest 
containing the hard items, the other the easy items. Because there were so 
many easy items in the number analogy test, this procedure was amended 
slightly. After the number analogies had been arranged in order of their 
easiness, the 25 easiest items were assigned to one subtest. Then items 
26 through 35 ware assigned to the second subtest. Items 36 through 40 were 
then assigned to the first subtest and items 41 through 60 were placed in the 
second subtest. This procedure was necessary because the ceiling on a sub- 
test composed of the thirty easiest number analogies was so low that many 
subjects would have received perfect scores, necessitating their elimination 
from the study. 

After the tests had been divided into subtests, the raw score, percentile 
rank, and Rasch ability estimate of each subject was computed for the two sub- 
tests. These item characteristics were computed using a program developed by 
Wright and Panchapakesan (1969,1970) and modified by Bart, Lele, and Rosse 
(1970) for use on the University of Minnesota CDC 6600 computer. Finally, 
the product -moment correlation and the mean and standard deviation of the ab- 
solute difference between the scores ?n the two subtests were computed for the 
raw scores, percentile ranks, and Rasch ability estimates. Support for the 
hypotheses that the ability estimates are invariant with respect to the 
easiness of the items in the test would be indicated if the correlation between 
ability estimates on the two tests approaches unity and the distribution of 
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the absolute differences between ability estimates on the two tests centers 
around zero. 

In each case, the sample for a given test consisted of those college 
and high school students who had completed the test, minus those whose score 
on the total test was lower than the r index recommended by Panchapakesan 
(1969) , and minus those who received a perfect or a zero score on either of the 
the subtests. The r index is an index suggested for the identification of 
subjects with scores so low that guessing may have been a factor in deter- 
mining their ability estimates. Thus, only those subjects for whom guessing 
was not a factor were included in this analysis. Table 2 indicates the num- 
ber of examinees excluded from this study and the number remaining. 

Insert Table 2 about here. 

Results 

The invariance of raw scores, percentile ranks, and Rasch ability esti- 
mates was investigated. If raw scores differ only by a constant associated 
with the difference in the difficulty of the test , the correlation between 
the two sets of .raw scores should approach unity and the mean of the distri- 
bution of absolute differences should be the constant. But if this is true, 
conversion of the raw scores to percentile ranks, separately for each sub- 
test, should be an effective method for equating subtest scores. Accordingly, 
the correlation between the two sets of percentile ranks should also approach 
unity, but the mean of the distribution of absolute differences in the sub- 
test percentile rank scores should approach zero. In practice, however, the 
above result is seldom observed. Scores differ by a variable rather than a 
constant amount. Measurement by the Rasch model supposedly avoids this 
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probles,. Since the items in the subtests were calibrated on a common scale, 
the Rasch ability estimates from the two subtests should be on a common scale. 
This means that there should be no difference in the scores of the two sub- 
tests. The correlation between scores on the two subtests should approach 
unity and the mean of the distribution of absolute differences in scores 
should approach zero. 

Table 3 gives the correlations between the scores on the four types 
of subtests. The highest correlations were observed between scores on 
the word analogy subtests, with raw scores and percentile ranks correlating 
.6G and Rasch ability estimates correlating -57. Intermediate c chelations 
were observed for the picture and number analogy subtests. For the pic~ 
ture analogies, raw scores correlated .47., percentile ranks correlated .50, 
and Rasch ability estimates correlated .48; the corresponding cor- 
relations for number analogies were .47, .51, and .51. The lowest correl- 
ations occured for symbol analogies. Raw st'JiteS correlated .27, percentile 
ranks correlated .30, and Rasch ability estimates correlated .27. 

Insert Table 3 about here 

Table 4 indicates the mean and standard deviation of the distribution 
of absolute differences in subtest scores for each of the four tests. The 
mean difference in raw scores ranged from 9.25 for symbol analogies to 
12.56 for number analogies with the mean varying between 3.0 and 3.5 stand- 
ard deviations above zero. The mean differences in percentile ranks were 
.18 for word analogies, .22 for number and picture analogies, and .27 for 
symbol analogies, and varied between 1.2 and 1.3 standard deviations above 
zero. The mean differences in Rasch ability estimates were .55 and .57 for 
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word and picture analogies, .72 and .73 for number and symbol analogies, 
and, like the mean differences for percentile ranks, vary between 1.2 and 
1.3 standard deviations above zero. 

Insert Table 4 about here. 

Discussion 

One of the most promising features of the Rasch model is that it would 
make possible the individualization of measurement. Once a pool of items cali- 
brated on a common scale has been developed, individuals need complete only 
those items appropriate to their ability level and their scores can be con- 
verted to ability estimates on a common scale. This means that the scores of 
the individuals can be compared even if the tests they completed do not have 
one single item in common. It was with this feature of the Rasch model that 
this research was concerned. 

This research investigated the hypotheses that raw scores, percentile 
ranks and Rasch ability estimates are invariant with respect to the items 
used in measurement. The data indicate that there is little difference among 
the three ability measures; all three are dependent upon the items used in 
measurement. However, this finding is misleading — a reflection of the 
inadequacy of the research design. In the first place, it is illogical to 
assume that testa which do not fit the Raach model will still have the 
characteristics attributed to it. Only one of the eight subtests used in this 
research had a Rasch maximum likelihood probability greater than .05. The 
probability of the easy picture subtest was .03 and the probability of the 
hard symbol subtest was .44. The maximum likelihood of the remaining six 
subtests was less than .001. There is no reason, therefore, to expect that 
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results based on these tests will possess the properties of the Rasch model. 

Another problem with this research design concerns the method of admin- 
istering the test questions. The goal of the Rasch model is to measure the 
individual as accurately as possible. Jhe precision of the measurement 
depends on the number of items used in the measurement and the appro- 
priateness of the items for the ability of the examinee (Panchapakesan, 1969). 
If the use of the Rasch model is to lead to more precise measurement, the 
standardized method of item presentation in which each examinee answers every 
question must be abandoned. Take, for example, the case of a lovi ability 
subject. Many of the items on the easy subtest were no doubt appropriate 
for measuring his ability. It is even possible that his ability was rather 
precisely estimated in this subtest. In contrast, most of the questions on 
the hard subtest were inappropriate for this examinee. Each of the questions 
gave very little information about his ability and the resulting ability ex- 
timate was based upon very little information. Consequently, the two ability 
estimates would have very little chance of agreeing. 

If the research design for this study is inappropriate , how is it that 
Wright (1968) achieved satisfactory results using essentially the same de- 
sign? It has already been suggested that Wright analyzed his data incor- 
rectly. Wright reported the mean and standard deviation of the distribution 
of differences, where the mean and standard deviation of the absolute differ- 
ences would have been more appropriate. Table 5 presents the means and stand- 
ard deviations of the distributions of signed differences for the data reported 
in this study. The results represent Wright's (1968) method of analysis and 
can be compared with those presented in Table 4. The results for word, pic- 
ture, and symbol analogies, when looked at in this manner, compare favorably 
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with those reported by Wright (see Table 1.) Wright (1968, pp. 95-56) inter- 
prets his results as indicating that the Rasch simple logistic model yields 
item-free person measurement. It has been shown, however, that these results 
are artifacts of the method of analysis employed. 

Insert Table 5 about here. 

The research design, then, was inappropriate for testing the hypothesis 
that Rasch ability estimates are invariant with respect to the items used in 
measurement. A successful test of this hypothesis requires a procedure for 
the individualized administration of items. Subtests could be constituted 
from odd-numbered vs. even-numbered items, after ordering all items according 
to easiness. A stringest test of the hypothesis could still be obtained by 
estimating an individual's ability on two subtests, one consisting of largely 
inappropriate items (e.g., very easy items), the other consisting of items 
appropriate to the ability of the examinee. In both cases, testing would 
continue until a specified precision of measurement was achieved. If the 
hypothesis is supported, the two ability estimates would be identical within 
the limits of error allowed by the precision of measurement. 
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Table 1 

Mean, Standard Deviation of Differences in Scores 
on Easy and Hard Tests 
(N = 976) 



Ability 




Standard 


Estimate 


Mean 


Deviation 


Raw scores 


6.78 


3.30 


Rasch 


.061 


.749 
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Table 2 



Sample Size 



Analogy 

Test 


Initial 

Sample 


KGaSOrir 
Low Total 
Score 


£6t U6l6Uit>ft 
Perfect Subtest 
Score 


Final 

Sample 


Word 


949 


62 


22 


C65 


Picture 


612 


14 


8 


590 


Number 


626 


36 


10 


580 


Symbol 


938 


83 


21 


834 



0 

ERIC 



18 






-15- 



Table 3 

Coorelation of Subtest Scores 




Ability 

Estimate 



Analogy Test 

Word Picture Number Symbol 



Raw Score 


.68 


.47 


.47 


.27 


Percentile 

Rank 


.68 


.50 


.51 


.30 


Rasch 


.67 


.48 


.51 


.27 
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Table 4 



Mean, Standard Deviation of Absolute Differences 
in Subtest Scores 



Ability 




Analogy Test 




Estimate 


Word 


Picture 


Number 


Symbol 


Raw 

Scores 


10.14+ 3.35 


10.43+2.97 


12.56+3.70 


9.25+2.92 


Percentile 

Rank 


.18+ .15 


.22+ .18 


.22+ .18 


.27+ .21 


Rasch 


.55+ .42 


.57+ .47 


.72+ .57 


.73+ .56 
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Table 5 



Mean, Standard Deviation of Signed Differences 
in Subtest Scores 



Ability 




Analogy Test 






Estimate 


Word 


Picture 


Number 


Symbol 


Raw 


Scores 


10.14+3.36 


10.42+3.00 


12.55+3.76 


9.24+2.95 


Percentile 


Rank 


.007+. 238 


.003+. 288 


.017+. 286 


-.008+. 343 


Rasch 


.047+. 6 96 


.094+. 733 


.196+. 901 


.038+. 916 
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